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Abstract 

Influence maximization is the problem of finding a set of influential users in a social network such 
that the expected spread of influence under a certain propagation model is maximized. Much of the pre- 
vious work has neglected the important distinction between social influence and actual product adoption. 
However, as recognized in the management science literature, an individual who gets influenced by social 
acquaintances may not necessarily adopt a product (or technology), due, e.g., to monetary concerns. 
In this work, we distinguish between influence and adoption by explicitly modeling the states of being 
influenced and of adopting a product. We extend the classical Linear Threshold (LT) model to incorpo- 
rate prices and valuations, and factor them into users' decision-making process of adopting a product. 
We show that the expected profit function under our proposed model maintains submodularity under 
certain conditions, but no longer exhibits monotonicity, unlike the expected influence spread function. To 
maximize the expected profit under our extended LT model, we employ an unbudgeted greedy framework 
to propose three profit maximization algorithms. The results of our detailed experimental study on three 
real- world datasets demonstrate that of the three algorithms, PAGE, which assigns prices dynamically 
based on the profit potential of each candidate seed, has the best performance both in the expected profit 
achieved and in running time. 



1 Introduction 

The rapidly increasing popularity of online social networking sites such as Facebook, Google+, and Twitter 
has facilitated immense opportunities for large-scale viral marketing. Viral marketing was first introduced to 
the data mining community by Domingos and Richardson 7pQ ; it is a cost-effective method to promote a new 



product (or technology) by giving free or discounted samples to a selected group of influential individuals, 
in the hope that through the word-of-mouth effects over the social network, a large number of product 
adoptions will occur. 

Motivated by viral marketing, influence maximization (InfMax) has emerged as a fundamental problem 
concerning the propagation of innovations through social networks. In their seminal paper, Kempe et al. [l6] 
formulated InfMax as a discrete optimization problem: given a directed graph G = (V,E) (representing 
a social network) with pairwise influence weights on edges, and a positive number fc, find k individuals or 
seeds, such that by activating them initially, the expected spread of influence (or spread for short) in the 
network under a certain propagation model is maximized. Two classical propagation models studied in [l6] 
are the Independent Cascade (IC) and the Linear Threshold (LT) model. In this paper, we focus on the LT 
model, the details of which are deferred to Sec. [2] 

* An abbreviated version of this paper appears in the Proceedings of the 12th IEEE International Conference on Data Mining 
(ICDM 2012), Brussels, Belgium, December 10 - 13, 2012. The copyright of the conference version belongs to IEEE. 
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The expected spread of influence of any set Scy, denoted by ^(-S^J is defined as the expected number 
of activated nodes after the diffusion process starting from £ quiesces. Under both IC and LT models, 
InfMax is NP-hard and the influence function h is monotone and sub modular. A set function /: 2 X — 
is monotone if f(S) < f(T) whenever S C T C X, where X is the ground set. The function / is submodular 
if f(S U {x}) - f(S) > f(T U {x}) - f(T) holds for all S C T C X and x G X \ T. Submodularity naturally 
captures the law of diminishing marginal returns, a fundamental principle in microeconomics. With these 



good properties, approximation guarantees can be provided for InfMax 16 . 

Although InfMax has been studied extensively, a majority of the previous work has focused on the 
classical propagation models, namely IC and LT, which do not fully incorporate important monetary aspects 
in people's decision-making process of adopting new products. The importance of such aspects is seen in 
actual scenarios and recognized in the management science literature. As real-world examples, until recently, 
Apple's iPhone has seemingly created bigger buzz in social media than any other smartphones. However, its 
worldwide market share in 2011 fell behind Nokia, Samsung, and LG0 This is partly due to the fact that 
iPhone is pricier both in hardware (if one buys it contract-free and factory- unlocked) and in its monthly rate 
plans. On the contrary, the HP TouchPad was shown little interest by the tablet market when it was initially 
priced at $499 (16GB). However, it was sold out within a few days after HP dropped the price substantially 
to $99 (16GB) 

In management science, the adoption of a new product is characterized as a two-step process [15] . In the 
first step, "awareness" , an individual gets exposed to the product and becomes familiar with its features. In 
the second step, "actual adoption", a person who is aware of the product will purchase it if her valuation 
outweighs the price. Product awareness is modeled as being propagated through the word-of-mouth of existing 
adopters, which is indeed articulated by classical propagation models. However, the actual adoption step is 
not captured in these classical models and is indeed the gap between these models and that in [15] . 

In a real marketing scenario, viral or otherwise, products are priced and people have their own valuations 
for owning them, both of which are critical in making adoption decisions. Precisely, the valuation of a person 
for a certain product is the maximum money she is willing to pay for it; the valuation for not adopting is 
defined to be zero 21 . Thus, when a company attempts to maximize its expected profit in a viral marketing 
campaign, such monetary factors need to be taken into account. However, in InfMax, only influence weights 
and network structures are considered, and the marketing strategies are restricted to binary decisions: for 
any node in the network, an InfMax algorithm just decides whether or not it should be seeded. 

To address the aforementioned limitations, we propose the problem of profit maximization (ProMax) 
over social networks, by incorporating both prices and valuations. ProMax is the problem of finding an 
optimal strategy to maximize the expected total profit earned by the end of an influence diffusion process 
under a given propagation model. We extend the LT model to propose a new propagation model named the 
Linear Threshold model with user Valuations (LT-V), which explicitly introduces the states influenced and 
adopting. Every user will be quoted a price by the company, and an influenced user adopts, i.e., transitions 
to adopting, only if the price does not exceed her valuation. 

As pointed out by Kleinberg and Leighton 17 , people typically do not want to reveal their valuations 
before the price is quoted for reasons of trust. Moreover, for privacy concerns, after a price is quoted, they 
usually only reveal their decision of adoption (i.e., "yes" or "no"), but do not wish to share information 



about their true valuations. Thus, following the literature 17 21 , we make the independent private value 
(IPV) assumption, under which the valuation of each user is drawn independently at random from a certain 
distribution. Such distributions can be learned by a marketing company from historical sales data. Further- 
more, our model assumes users to be price-takers who respond myopically to the prices offered to them, 
solely based on their privately-held valuations and the price offered. 

Since prices and valuations are considered in the optimization, marketing strategies for ProMax require 
non-binary decisions: for any node in the network, we (i.e., the system) need to decide whether or not to 
seed it, and what price should be quoted. Given this factor, the objective function to optimize in ProMax, 



1 The standard notation for the influence function is a 16 , but since a is used for the normal distribution A/"(/i, cr 2 ) in the 
paper, we use h here. 

2 IDC Worldwide Mobile Phone Tracker, July 28, 2011. 



http : / / www . pcworld . com/ article/237088/hp_drops_touchpad_price_to_spur_sales . html 
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i.e, the expected total profit, is a function of both the seed set and the vector of prices. As we will show in 
Sees. [3] and [4j since discounting may be necessary for seeds, the profit function is in general non-monotone. 
Also, as we show, the profit function maintains submodularity for any fixed vector of prices, regardless of 
the specific forms of valuation distributions. 

In light of the above, ProMax is inherently more complex than InfMax, and calls for more sophisticated 
algorithms for its solution. As the profit function is in the form of the difference between a monotone 
submodular set function and a linear function, we first design an "unbudgeted" greedy (U-G reedy) framework 
for seed set selection. In each iteration, it picks the node with the largest expected marginal profit until the 
total profit starts to decline. We show that for any fixed price vector, U-G reedy provides quality guarantees 
slightly lower than a (1 — 1 /e)- approximation. To obtain complete profit maximization algorithms, we propose 
to integrate U-Greedy with three pricing strategies, which leads to three algorithms All-OMP (Optimal Myopic 
Prices), FFS (Free- For- Seeds), and PAGE (Price- Aware GrEedy). The first two are baselines and choose prices 
in ad hoc ways, while PAGE dynamically determines the optimal price to be offered to each candidate seed in 
each round of U-Greedy. Our experimental results on three real- world network datasets illustrate that PAGE 
outperforms All-OMP and FFS in terms of expected profit achieved and running time, and is more robust 
against various network structures and valuation distributions. 

Road- map. Sec. [2] discusses related work. Sec. [3] describes the LT-V model and defines ProMax. Sec. [4] 
presents our profit maximization algorithms. We discuss experiments in Sec. [5j and present extensions and 
conclusions in Sec. 

2 Background and Related Work 

Domingos and Richardson [7||2Q] first posed InfMax as a data mining problem. They modeled the problem 
using Markov random fields and proposed heuristic solutions. Kempe et al. [16] studied InfMax as a discrete 
optimization problem, and utilized submodularity of the spread function h to obtain a greedy (1 — 1/e)- 
approximation algorithm using the results in [l9] (see Algorithm [l] Greedy). Greedy starts from an empty 
set; in each iteration it adds to S the element with the largest marginal gain until 1*51 = k. 

The Linear Threshold Model. We now describe the LT model 16 in detail. In this model, each node 
Ui chooses an activation threshold Oi uniformly at random from [0,1], representing the minimum weighted 
fraction of active in-neighbors necessary so as to activate U{. Each edge {u^Uj) G E is associated with an 
influence weight Wij] for each Uj G V, Z^eN^o^) w hj — 1> where N m (i^) is the set of in-neighbors of 
(i.e., the sum of incoming weights does not exceed 1). Time proceeds in discrete steps. At time 0, a seed set 
S is activated. At any time t > 1, we activate any inactive ui if the total influence weight from its active 
in-neighbors reaches or exceeds 0^. Once a node is activated, it stays active. The diffusion process completes 
when no more nodes can be activated. 

Chen et al. [6] showed that it is #P-hard to compute the exact expected spread of any node set in 
general graphs for the LT model. Thus, a common practice is to estimate the spread using Monte-Carlo 
(MC) simulations, in which case the approximation ratio of Greedy drops to 1 — 1/e — e, where e > 
depends on the number of MC simulations run 16 . By further exploiting submodularity, 18 proposed the 
cost-effective lazy forward (CELF) optimization, which improves the running time of Greedy by up to 700 
times. 

Recently, Bhagat et al. 2 addressed the difference between product adoption and influence in their LT-C 
(Linear Threshold with Colors) model. In LT-C, the extent to which a node is influenced by its neighbors 
depends on two factors: influence weights and the opinions of the neighbors. LT-C also features a "tattle" 
state for nodes: if an influenced node does not adopt, it may still propagate positive or negative influence to 
neighbors. However, unlike us, the LT-C model does not consider monetary aspects in product adoption. 

Considerable work has been done on pricing in social networks. Hartline et al. [13] studied optimal 
marketing for digital goods in social networks and proposed the influence- and-exploit (IE) framework. In IE, 
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Algorithm 1: Greedy (G = (V,E), k, h) 



2 for i — 1 — > k do 

u <- argmax n . Gy \ 5 [h(S U {m}) - h(S)}; 
S^SU{u}; 
5 Output S; 
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Figure 1: Node states in the LT-V model. 



seeds are offered free samples, and the seller can approach other users in a random sequence, bypassing the 
network structure. Arthur et al. 1 adopted IE to study a similar problem in which users arrive in a sequence 
decided by a cascade model (IC). However, in |1 , seeds are given as input (with free samples offered), whereas 
in our case, the choice of the seed set and of the prices are driven by profit maximization. These choices are 
made by the algorithms. Work in |3,4 formulated pricing in social networks as simultaneous- move games 
and studied equilibria of the games, whereas we focus on stochastic propagation models with social influence. 



3 Linear Threshold Model with User Valuations and Its Proper- 
ties 



In Sec. 3.1 , we describe our proposed LT-V model and define the profit maximization problem (ProMax). 



We then study a restricted case of ProMax and present theoretical results for it in Sec. |3.2| In Sec. |3.3[ we 
establish the submodularity result for general ProMax. 



3.1 Model and Problem Definition 

In the LT-V model, the social network is modeled as a directed graph G = (V, E), in which each node Ui £ V 
is associated with a valuation vi £ [0, 1]. Recall that in Sec. [I] we made the IPV assumption under which 
valuations are drawn independently at random from some continuous probability distribution assumed known 
to the marketing company. Let Fi(x) = Pr[^ < x] be the distribution function of and fi(x) = ^Fi(x) 
be the corresponding density function. The domain of both functions is [0, 1] as we assume both prices and 
valuations are in [0,1]. As in the classical LT model, each node Ui has an influence threshold 6^ chosen 
uniformly at random from [0, 1]. Each edge (ui,Uj) £ E has an influence weight Wij £ [0, 1], such that for 
each node Uj, X] ni GN m (n j ) w i,j — 1- ^ ( u ii u j) & define Wij = 0. Following |7|[2Q|, we assume that there is 
a constant acquisition cost c a £ [0, 1) for marketing to each seed (e.g., rebates, or costs of mailing ads and 
coupons). 



Diffusion Dynamics. Fig. [T] presents a state diagram for the LT-V model. At any time step, nodes are 
in one of the three states: inactive, influenced, and adopting. A diffusion under the LT-V model proceeds in 
discrete time steps. Initially, all nodes are inactive. At time 0, a seed set S is targeted and becomes influenced. 
Next, every user Ui in the network is offered a price pi by the system. Let p = (pi, . . . ,p\v\) £ [0? 1]'^' denote 
a vector of quoted prices, which remains constant throughout the diffusion. For any ui £ S, it gets one 
chance to adopt (enters adopting state) at step if pi < vf, otherwise it stays influenced. 
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At any time t > 1, an inactive node Uj becomes influenced if the total influence from its adopting in- 
neighbors reaches its threshold, i.e., J2 Uie N in (u-)&ui adopting w hj — ®r Then, Uj will transition to adopting at 
t if pj < Vj, and will stay influenced otherwise. The model is progressive, meaning that all adopting nodes 
remain as adopters and no influenced node will ever become inactive. The diffusion ends if no more nodes 
can change states. 

Following [l5] , we assume that only adopting nodes propagate influence, as adopters can release 
experience-related product features (e.g., durability, usability), making their recommendations more effective 
in removing doubts of inactive users. This distinguishes our model from LT-C [2], and in fact, the extensions 
to the LT model employed in LT-C and in LT-V are orthogonal and address different aspects in propagations 
of influence and adoption. 

Formally, we define [0, — >• R to be the profit function such that tt(S, p) is the expected (total) 

profit earned by the end of a diffusion process under the LT-V model, with S as the seed set and p as the 
vector of prices. The problem studied in the paper is as follows. 

Definition 1 (Profit maximization (ProMax)). Given an instance of the LT-V model consisting of a graph 
G = (V, E) with edge weights, find the optimal pair of a seed set S and a price vector p that maximizes the 
expected profit 7r(S, p) . 

The Virtual Mechanism and Its Truthfulness Guarantee. Recall that users are assumed to be 
price-takers making adoption decisions just by comparing the quoted price to their valuation. Thus, it is 
natural to ask: would an influenced user be better off by acting strategically, i.e., by not deciding solely by 
comparing her true valuation to the price? In other words, for any pricing strategy used by a company, is it 
robust against strategic behaviors of users? 

In fact, the price-taking procedure in LT-V can be structured as a virtual mechanism that we show is 
truthful, and hence the dominant strategy for all users is to use true valuation. It is worth emphasizing that 
the mechanism is virtual since in our model, the company needs not to run it and users will not be asked to 
declare their valuation. 

Definition 2 (The Virtual Mechanism). An influenced user Ui declares some valuation to the company; 
then Ui is sold the product at price pi if pi is no greater than the declared valuation, and not sold otherwise. 

Theorem 1 (Truthfulness of the Virtual Mechanism). The mechanism defined in Definition^ is truthful. 
That is, the utility any user Ui gets by declaring any number Vi 7^ Vi is no greater than that she gets by 
declaring Vi truthfully. 

Proof. Consider the case that the true valuation v\ < p\. Then, if reporting v\ truthfully, Ui will not adopt 
Pi, and hence her utility is 0. Suppose that U{ reports lower (v>i < vi)\ she still would not get the product 
and the utility is still 0. Suppose otherwise (ui reports higher: vi > vi). Then, if vi < p%, the situation is 
the same, in which Ui gets zero utility. If i)i > Pi, then Ui ends up paying pi to adopt and having a negative 
utility, since — pi < 0. 

Then, consider the case that > Pi, in which if Ui reports truthfully, she will adopt the product by 
paying pi and enjoy a non- negative utility V{ — Pi. Suppose that Ui reports higher (di > Vi), then she still 
gets to pay pi and has utility V{ — pi. Suppose otherwise (u{ reports lower: vi < Vi). Then, if vi is still no less 
than pi, she still pays pi and has utility Vi — Pi, while if Vi happens to be lower than pi, she will not buy it 
and has zero utility. And this completes the proof. □ 

3.2 A Restricted Special Case of ProMax under LT-V 

To better understand the properties of the LT-V model and the hardness of ProMax, we first study a 
special case of the problem. We assume the valuation distributions degenerate to an identical single-point, 
i.e., \/ui G V , Vi = p for some p G (0, 1] with probability 1. As mentioned in Sec. [l] this is usually not the 
case; the degeneration assumption here is of theoretical interest only. 
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For simplicity, we also assume that for every U{ G 5, the quoted price pi = 0. Since valuation is the 
maximum money one is willing to pay for the product, in this case, the optimal pricing strategy is to set 
Pj = Pi £ V\S. The situation amounts to restricting the marketing strategy to a binary one: free sample 
(Pi — 0) f° r seeds and full price for non-seeds (pj = p). Given this pricing strategy, once a node is influenced, 
it transitions to adopting with probability 1. Thus, ProMax boils down to a problem to determine a seed 
set 5, and the profit function 7r(S, p) reduces to a set function tt(S), since p is uniquely determined given S: 

n(S)=p-(h L (S)-\S\)-c a \S\ 

= p.h L (S)-(p + c a )\S\, (1) 

where Hl(S) is the expected number of adopting nodes under the LT-V model by seeding S. 

In general, the degenerated profit function tt is non-monotone. To see this, let u be any seed that provides 
a positive profit. Now, clearly tt(0) = < tt({u}) but tt(V) < < tt({u}), as giving free samples to the whole 
network will result in a loss of c a \ V\ on account of seeding expenses. Since tt is non- monotone, unlike InfMax, 
it is natural to not use a budget k for the number of seeds, but instead ask for a seed set of any size that 
results in the maximum expected profit. In other words, the number of seeds to be chosen, /c, is not preset, 
but is rather determined by a solution. This restricted case of ProMax is to find S = argmax Tcy tt(T), 
which we show is NP-hard. 

Theorem 2. The Restricted ProMax problem (RPM) is NP-hard for the LT-V model. 

Proof. Given an instance of the NP-hard Minimum Vertex Cover (MVC) problem, we can construct 
an instance of the ProMax problem, such that an optimal solution to the ProMax problem gives an 
optimal solution to the MVC problem. Consider an instance of MVC defined by an undirected n-node graph 
G = (V, E)\ we want to find a set S such that 1*51 = k and k is the smallest number such that G has a vertex 
cover (VC) of size k. 

The corresponding instance of RPM is as follows: first, we direct all edges in G in both directions to 
obtain a directed graph G' = (V,E'), where E' is the set of all directed edges. Then, for each U{ G V, set 
6i = 1; for each (iii,Uj) G E, define Wij = l/d m (^j), where d in (uj) is the in-degree of Uj in G '. Lastly, set 
p = 1 and c a = 0, in which case tv(S) = Hl(S) — \S\. Now, we want show that a set S C V is a minimum 
vertex cover (MVC) of G if and only if S = argmax TCy 7r(T). 

(=>). If S is a MVC of G, then in ProMax we choose S as the seed set, so that n(S) = n - \S\. This 
is optimal, shown by contradiction. Suppose otherwise, i.e., there exists some T C V, T ^ 5, such that 
tt(T) > tt(S). For the case of \T\ > \S\, we have tt(T) = h L (T) - |T| < h L (T) - \S\. Since h L (T) < n, 
ft(T) < h L (T) -\S\<n - \S\ = tt(5), which is a contradiction. For the case of \T\ < \S\, let |5| - \T\ = t. 
Thus, 7r(T) = Hl(T) — (\S\ — £). Since T is not a VC, = n , an d it is supposed that tt(T) > tt(S), we 

have Hl(T) = n — j, for some j G [1,^). Then, from the way in which influence weights and thresholds are 
set up, we know there are exactly j nodes in V \ T that are not activated. Let J be the set containing those 
j nodes, and consider the set T' = T U J, for which we have tt(T') = n. From the proof of Theorem 2.7 
of [16], T' is a VC of G. But since \T'\ = \T\ + j < T' is a VC with a strictly smaller size than 5, which 
gives a contradiction since S is a MVC. 

(<=)• Suppose that S = argmax TCy tt(T), but S is not a VC of G (we will consider MVC later). This 
implies that there exists at least one edge e G E such that both endpoints of e, denoted by U{ and Uj, are not 
in S. From the way in which influence weights and thresholds are set up in G' , we know both Ui and Uj are 
not activated. Thus, if we add either one of them, say into 5, hL(SU {ui}) is at least Hl(S) + 2, and thus 
7r(SU{ui}) —tt(S) > 1, which contradicts with the fact that S optimizes tt. Hence, S must be a VC of G. Now 
suppose that in addition S is not a MVC. Then, there must exist some x G S such that the node-set S\ {x} 
is still a VC of G; this means that h L (S \ {x}) = n, too. Thus, n(S \ {x}) = n - \S\ + 1 > n(S) = n - \S\, 
which is a contradiction. Hence, S is indeed a MVC of G. 

Now we have shown that an optimal solution to the restricted ProMax problem is an optimal solution 
to the Minimum Vertex Cover problem, and vice versa; this completes the proof. 

□ 
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Algorithm 2: U-Greedy (G = (V,E), tt) 



2 while true do 

3 u <- argmax n . Gy \ 5 [tt(S U {m}) - tt(S)}; 

4 if tt(S U {u}) - n(S) > then 

5 | 5^5U{^}; 

6 else break; 

7 Output S; 



Observe that both components of tt, Hl(S) and —\S\, are submodular, which leads to the submodularity 
of tt as it is a non-negative linear combination of two submodular functions. However, unlike for InfMax, 
the function is non- monotone and we want to find a set S of any size that maximizes tt(S), so the standard 
Greedy is not applicable here. In [8], Feige et al. gave a randomized local search (2/5-approximation) for 
maximizing general non- monotone submodular functions. This is applicable to 7r, but have time complexity 
0(|y| 3 |£'|/e), where (1 + e/|V| 2 ) is the per- step improvement factor in the search. By contrast, the function 
tt is the difference between a monotone submodular function and a linear function, we propose a greedy 
approach (Algorithm [2] U-Greedy) with time complexity 0(|F| 2 |£^|) and a better approximation ratio, which 
is slightly lower than 1 — 1/e. U-Greedy grows the seed set S in a greedy fashion similar to Greedy, and 
terminates when no node can provide positive marginal gain w.r.t. S. 

Theorem 3. Given an instance of the restricted ProMax problem under the LT-V model consisting of 
a graph G = (V,E) with edge weights and objective function it, let S g C V be the seed set returned by 
Algorithm^ and S* CV be the optimal solution. Then, 

w(S g ) > (1 - 1/e) • w(S*) - e(max{|5 9 |, (2) 

Proof Case (i). If < then since Hl is monotone and submodular, hL{S g ) > (1 — 1/e) • 1il(S*). 
Thus, by the definition of tt, we have 

n(S g )=p-h L (S g )-(p + c a )\S g \ 

>p(l-l/e)-h L (S*)-(p + c a )\S g \ 

= (1 - 1/e) • tt(S*) - (p + c a ) \S g \ + (1 - l/e)(p + c a ) \S*\ 
= (l-l/e).7r(S*)-e(5 fl ). 

Case (ii). If | S* | > 1 5^ |, consider a set S' g obtained by running U-Greedy until |S^| = Clearly, from 
case (i), we have Tr(S g ) > (1 - 1/e) • tt(5*) - 9(|5^|). Due to the fact that 15*1 = \S' g \ > \S g \, and S g is 
obtained by running U-Greedy until no node can provide positive marginal profit, we have Tt(S g ) > 7r(S f g ) > 
(1 - 1/e) • tt(S*) - 6(15*1). Combining the above two cases gives Eq. □ 

Theorem |3] indicates that the gap between the U-Greedy solution and a (1 — 1/e) -approximation grows 
linearly w.r.t. the cardinality of the seed set. Since this cardinality is typically much smaller than the expected 
spread, U-Greedy can provides quality guarantees for restricted ProMax with objective function tt. 

3.3 Properties of the LT-V Model in the General Case 

Theorem [2] shows that in a restricted setting where exact valuations are known and the optimal pricing 
strategy is trivial, ProMax is still NP-hard. Now we consider the general ProMax described in Sec. |3.1[ 
and show that for any fixed price vector, the general profit function maintains submodularity (w.r.t. the seed 
set) regardless of the specific forms of the valuation distributions. 

Given a seed set S and a price vector p, let ap(iii\S, p) denote u^s adoption probability, defined as the 
probability that U{ adopts the product by the end of the diffusion started with seed set S and price vector 
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p. Similarly, let ipfai\S,p-i) denote u^s probability of getting influenced under the same initial conditions, 
where G [0, l]l y l _1 is the vector of all prices excluding pi. Also, let 7r^(5, p) be the expected profit 
earned from m. By model definition, for any m G V \ 5, we have apfai\S,p) = ipfai\S,p-i) • (1 — Fi(pi)) 
and 71-^(5, p) =pi -ap(ui\S,p). If ui G 5, vpfai\S,p-i) = 1 and 7r w (5,p) = p* • (1 - Fiipi)) - c a . 

By linearity of expectations, we have tt(S, p) = Sn-eF ^^(S, p). Hence, to analyze the profit function, 
we just need to focus on the adoption probability, in which the factor (1 — Fifai)) does not depend on S, 
but p_i) calls for careful analysis, which we will present in the proof of Theorem [4] 

Let v = (i>i, . . . , v\y\) G [0, be a vector of user valuations, corresponding to random samples drawn 
from the various user valuation distributions. We now have: 

Theorem 4 (Submodularity). Given an instance of the LT-V model, for any fixed vector p G [0,1] ' y ' of 
prices, the profit function 7r(S, p) is submodular w.r.t. S, for an arbitrary vector v of valuation samples. 

The proof of submodularity of the influence spread function h in the classical LT model [l6] relies on 
establishing an equivalence between the LT model and reachability in a family of random graphs generated 
as follows: for each node U{ G V, select at most one of its incoming edges at random, such that (uj,Ui) is 
selected with probability Wj^, and no edge is selected with probability 1 — Y^ u -eN in (ui) w j,i- We will use a 
similar approach in the proof of Theorem [4] 

Proof of Theorem^ By linearity of expectation as well as the above analysis on adoption probabilities, 

^(^P) = E^ev^K^P) = Y. u% es[Pi^ ~ F i(Vi)) ~ c a] + Y.u x £S^ Since the first 

sum is linear in 5, it suffices to show that ipfai\S,p-i) is submodular in 5, whenever Ui S. 

To encode random events of the LT-V model using the possible world semantics, we do the following. 

First, we run a node coloring process on G: for each node if Pi < Vi, color it black; otherwise color it 



white. Meanwhile, we run a live- edge selection process following the aforementioned protocol 16 . Note that 
the two processes are orthogonal and independent of each other. Combining the results of both leads to 
a colored live- edge graph, which we call a possible world X. Let X be the probability space in which each 
sample point specifies one such possible world X. 

Next, we define the notion of "black-reachability". In any possible world X, a node ui is black-reachable 
from a node set S if and only if there exists a black node s G S such that ui is reachable from s via a 
path consisting entirely of black nodes, except possibly for (even if is white, it is still considered black- 
reachable since here we are interested in the probability of being influenced, not adopting). From the same 
argument in the proof of Claim 2.6 of [l6], on any black- white colored graph, the following two distributions 
over the sets of nodes are the same: (1) the distribution over sets of influenced nodes obtained by running the 
LT-V process to completion starting from S; (2) the distribution over sets of nodes that are black- reachable 
from 5, under the live-edge selection protocol. 

Let I x failS) be the indicator set function such that it is 1 if u\ is black-reachable from 5, and otherwise. 
Consider two sets S and T with S C T C V, and a node x G V\T. Consider some Ui that is black-reachable 
from TU{x} but not from T. This implies (1) U{ is not black-reachable from S either (otherwise, U{ would also 
be black-reachable from T, which is a contradiction); (2) the source of the path that "black-reaches" Ui must 
be x. Hence, U{ is black-reachable from S'LJ{x}, but not from 5, which implies Ixfai\SU{x})— Ixfai\S) = 1 > 
1 = I x fai\TU{x})-I x fai\T). Thus, I x fai\S) is submodular. Since ipfai\S,p-i) = Y^xex Pr I x ] ' I xfai\S) 
is a nonnegative linear combination of submodular functions, this completes the proof. □ 

We also remark that in general graphs, given any S and p, it is #P-hard to compute the exact value of 
7r(S, p) for the LT-V model, just as in the case of computing the exact expected spread of influence for the 
LT model. This can be shown using a proof similar to the one for Theorem 1 in [6]. 



4 Profit Maximization Algorithms 

For ProMax, since the expected profit is a function of both the seed set and the vector of prices, a ProMax 
algorithm should determine both the seed set and an assignment of prices to nodes to optimize the expected 
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Algorithm 3: All-OMP (G = (V,E), tt, F^m G F)) 



1 S <- 0; p m <- 0; 

2 foreach w« G V" do 

3 | p m \i] ^p™ = argmax pG[01] p • (1 - ^(p)); 

4 while trae do 
u^- argmax niGn5 [7r(5'U{^},p m ) - tt(S, p m )]; 
if tt(S U {u}, p m ) - tt(S, p m ) > then S ^SU {u}; 
else break; 

8 Output 5,p m ; 



profit. Accordingly, it has two components: (1). a seed selection procedure that determines 5, and (2). a 
pricing strategy that determines p. Due to acquisition costs and the possible need for seed- discounting (details 
later), 7r(S, p) is still non- monotone in S and is in the form of the difference between a monotone submodular 
function and a linear function. Hence, inspired by the restricted ProMax studied in |3.2[ we propose to use 
U-G reedy for seed set selection. 

We then propose three pricing strategies and integrate them with U-G reedy to obtain three ProMax 
algorithms. The first two, All-OMP and FFS, are baselines with simple strategies that set prices of seeds 
without considering the network structure and influence spread, while the third one, PAGE, computes optimal 
discounts for candidate seeds based on their "profit potential". Intuitively, it "rewards" seeds with higher 
influence spread by giving them a deeper discount to boost their adoption probabilities, and in turn the 
adoption probabilities of nodes that may be influenced directly or indirectly by such seeds. 

Notice that taking valuations into account when modeling the diffusion process of product adoption makes 
a difference for a marketing company. A pricing strategy that does not consider valuations is limited: either 
it charges everyone full price (or at best gives full discount to the seeds), or it uses an ad-hoc discount policy 
which is necessarily suboptimal. By contrast, PAGE makes full use of valuation information to determine 
the best discounts. 

4.1 Two Baseline Algorithms: All-OMP and FFS 

Recall that in our model, users in the social network are price-takers who myopically respond to the price 
offered to them. Thus, given a distribution function Fi of valuation i^, the optimal myopic price (OMP) [l3] 
can be calculated by: 



= argmaxp.(l-F i (p)). (3) 
pe[o,i] 

Offering OMP to a single influenced node ensures that the expected profit earned solely from that node 
is the maximum. This gives our first ProMax algorithm, All-OMP, which offers OMP to all nodes regardless 
of whether a node is a seed or how influential it is. First, for each Ui G V, it calculates p™ using Eq. |3|, and 
populates all OMPs to form the price vector p m = (p™, ••• 5 P|yi)- Then, treating p m fixed, it essentially runs 
U-G reedy (Algorithm |2| to select the seeds. When the algorithm cannot find a node of which the marginal 
profit is positive, it stops. 

Notice that Eq. ([3| overlooks the network structure and ignores the profit potential of seeds. This may 
lead to the sub-optimality of All-OMP in general. Fig [2] illustrates this with an example. Suppose that all 
valuations are distributed uniformly in [0, 1] and the acquisition cost c a = 0.001. Hence, p m = (1/2, . . . , 1/2). 
Consider seeding node 1: it adopts w.p. 0.5, giving a profit of 0.5 + 5 *0.5 3 — 0.001 = 1.124; it does not adopt 
w.p. 0.5, resulting in a profit of —0.001. Thus, the expected profit 7r({l},p m ) = 0.5615. However, when 
Pi = 3/16, ^({l^p!^ (3/16)) = 0.661Q This shows that for high-influence networks and low acquisition 

4 We use p-i x to denote a vector sharing all values with p except that the i-th coordinate is replaced by x, e.g., if 
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Figure 2: An example graph. 



Algorithm 4: FFS (G = (V,E), tt, ^(Vw* G 10) 

1 S <- 0; <- 0; 

2 foreach G V do 

3 | p f \i] ^p™ = argmax pG[01] p-(l-Fi(p)); 

4 while trae do 

5 u <- arg max UiG7 ^ [tt(S U {u*}, © 0) - tt(5, p0] ; 

6 if tt(5 U {u}, p{ n © 0) - ?r(S, p0 > then 

7 | S <- SU {u}; p f <- p f _ u ©0; 

8 else break; 

9 Output 5, p0 



cost, the profit earned by running All-OMP can be improved by seed- discounting, i.e., lowering prices for 
seeds so as to boost their adoption probabilities and thus better leverage their influence over the network. 
The intuition is that the profit loss over seeds (stemming from the discount) can potentially be compensated 
and even surpassed by the profit gain over non- seeds: more seeds may adopt as a result of the discount and 
the probabilities of non-seeds getting influenced will go up as more seeds adopt. 

Generally speaking, there exists a trade-off between the immediate (myopic) profit earned from seeds and 
the potentially more profit earned from non-seeds. Favoring the latter, we propose our second algorithm FFS 
(Free- For- Seeds) which gives a full discount to seeds and charges non-seeds the OMR FFS first calculates 
pm _ (p™ 5 ... 5 p™|) using Eq. (J3|. Then it runs U-Greedy: in each iteration, it adds to S the node which 
provides the largest marginal profit when a full discount (i.e., price 0) is given. For all seeds added, their 
prices remain 0; the algorithm ends when no node can provide positive marginal profit. 

Since FFS has a completely opposite attitude towards seed-discounting compared to All-OMP, intuitively, 
it should be suitable for high-influence networks and low acquisition costs, but it may be overly aggressive 
for low-influence networks and high acquisition costs. For example, in Fig [2] the FFS profit by seeding node 
1 is 0.625, better than the All-OMP profit 0.5615. But if all influence weights are 0.01 instead of 0.5, and 
c a = 0.01, All-OMP gives a profit of 0.246, while FFS gives only 0.0025. 

4.2 The PAGE Algorithm 

Both All-OMP and FFS are easy for marketing companies to operate, but they are not balanced and are 
not robust against different input instances as illustrated above by examples. To achieve more balance, we 
propose the PAGE (for Price- Aware GrEedy) algorithm (Algorithm [5|. PAGE also employs U-Greedy to select 
seeds. It initializes all seed prices to their OMP values (Step 3). In each round, it calculates the best price 
for each candidate seed such that its marginal profit (MP) w.r.t. the chosen S and p is maximized (Step 
7); then it picks the node with the largest maximum MP (Step 8). It stops when it cannot find a seed with 
a positive MP (Step 11). For all non-seed nodes, PAGE still charges OMP. We next explain the details of 
determining the best price for a candidate seed. 

p = (0.2, 0.3, 0.4), then p_i 0.5 = (0.5, 0.3, 0.4). 
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Given a seed set 5, consider an arbitrary candidate seed U{ G V\S, with its price pi to be determined. The 
marginal profit (MP) that U{ provides w.r.t. S with pi is MP(ui) = tt(S U {u^, (Bpi) — 7r(5, ® pf 1 ), 
where is fixed. The key task in PAGE is to find pi such that MP(ui) is maximized. Since ir(S, p-i (Bp™) 
does not involve ui and p^, it suffices to find pi that maximizes tt(S U {i^}, p_i ® Pi). 

Seeding Ui at a certain price Pi results in two possible worlds: world with Pr[x[^] = 1 — i^(p^), in 
which Ui adopts, and world with Pr[Xp^] = Fi(Pi), in which m does not adopt. In world x[ % \ the profit 
earned from Ui is Pi — c a and let the expected profit earned from other nodes be Y\. Similarly, in world Xq\ 
the profit from Ui is — c a and let the expected profit from other nodes be Yq. Notice that Y\ depends on the 
influence of Ui but Yq does not. Putting it all together, the quantity of 7r(SU{ui}, p-i (Bpi) can be expressed 
as a function of pi as follows: 

gtiPi) = (1 - F^)) ■ ( Pi + Y 1 ) + Fiipi) -Y -c a . (4) 

Similarly to the expected spread of influence in InfMax, the exact values of Y\ and Yq cannot be computed 
in PTIME (due to #P-hardness pi), but sufficiently accurate estimations can be obtained by Monte Carlo 
(MC) simulations. 

Finding = argmax pi€ j ,i] 9i(Pi) now depends on the specific form of the distribution function i^. We 
consider two kinds of distributions: the normal distribution, for which Vi ~ A/"(/i,a 2 ), G V, and the 
uniform distribution, for which Vi ~ W(0, 1), Mui G V. The choice of the normal distribution is supported 
by evidence from real- world data from Epinions.com (see Sec. [5J, and also work in 14 . When sales data 
are not available, it is common to consider the uniform distribution with support [0, 1] to account for our 
complete lack of knowledge [3 21 . 



The Normal Distribution Case. For normal distribution, assume that vi ~ A/"(/i, cr 2 ) for some \i and a, 
then \/pi G [0,1], 



Fi{Pi) = \ 



1 + erf 



V2a 



where erf(-) is the error function, defined as 

erf (x) = — / e~ t2 dt. 
Jo 

Plugging Fi(-) back into Eq. Q, one cannot obtain an analytical solution for p*, as erf (x) has no closed- form 
expression. Thus, we turn to numerical methods to approximately find p*. Specifically, we use the golden 
section search algorithm, a technique that finds the extremum of a unimodal function by iteratively shrinking 
the interval inside which the extremum is known to exist [9]. In our case, the search algorithm starts with 
the interval [0, 1], and we set the stopping criteria to be that the size of the interval which contains pi is 
strictly smaller than 10 -8 . 

The Uniform Distribution Case. The uniform distribution has easier calculations and analytical solu- 
tions. If Vi ~ U(0, 1), then \/pi G [0,1], Fi(pi) = pi, and plugging it back to Eq. Q gives 

9i(Pi) = -Pi + (1 " Y 1 + Yq) ■p l + Y 1 - c a . 

Hence, the optimal price 

P, 



(l + Yi-y ) 



2 

For both normal and uniform distributions, if p* > 1 or p* < 0, it is normalized back to 1 or 0, respectively. 
Also note that the above solution framework applies to any probability distribution that v\ may follow, as 
long as an analytical or numerical solution can be found for p\ . 

To conclude this section, steps [5][8] in Algorithm [5] (and also the U-G reedy seed selection procedure in 
All-OMP and FFS) can be accelerated by the CELF optimizati on [l8] , or the more recent CELF++ 11 . The 
adaptation is straightforward and the details can be found in |18| and [TT] . 
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Algorithm 5: PAGE (G = (V,E), tt, F^m G V)) 



1 S <- 0; p <- 0; 

2 foreach ^ G V do 

3 | p[i] <- p™ = argmax pG[0)1] p • (1 - F;(p)); 

4 while trae do 



5 
6 
7 
8 
9 
10 
11 



foreach Ui G V \ 5 do 

Estimate the value of Yq and Y\ by MC simulations; 
p* ^— argmax pi€ j ,i] 9i(Pi)i normalize if needed; 
u <- argmax u . en5 ^(p*); 

if tt(5 U K}, p_i © p*) - tt(5, p_i © pY 1 ) > then 
| S^SU{4 P<-P-i®K; 
else frreaA;; 



12 Output 5, p; 



Table 1: Statistics of Network Data. 



Dataset 


Epinions 


Flixster 


NetHEPT 


Number of nodes 


1TK 


7.6K 


15K 


Number of edges 


1T9K 


50K 


62K 


Average out-degree 


10.7 


6.5 


4.12 


Maximum out-degree 


1208 


197 


64 


^Connected components 


4603 


761 


1781 


Largest component size 


5933 


2861 


6794 



5 Empirical Evaluations 

We conduct experiments on real- world network datasets to evaluate our proposed baselines and the PAGE 
algorithm. In all these algorithms, a key step is to compute the marginal profit of a candidate seed. As 
mentioned in Sec. |3j computing the exact expected profit is intractable for the LT-V model. Thus, we 
estimate the expected profit with Monte Carlo (MC) simulations. Following 16 , we run 10,000 simulations 
for this purpose. This is an expensive step and as for InfMax, it limits the size of networks on which we can 
run these simulations. For the same reason, the CELF optimization is used in all algorithms as a heuristic. 
All implementations are in C++ and all experiments were run on a server with 2.50GHz eight-core Intel 
Xeon E5420 CPU, 16GB RAM, and Windows Server 2008 R2. 



5.1 Dataset Preparations 

Network Data. We use three network datasets whose statistics are summarized in Table [l] They include: 
(a) Epinions [20|, a who-trust-whom network extracted from review site Epinions.com: an edge (ui,Uj) is 
present if Uj has expressed her trust in i^'s reviews; (b) FlixsteiJ^] a friendship network from social movie 
site Flixster.com: if Ui and Uj are friends, we have edges in both directions; (c) NetHEPT (standard for 
InfMax 5,6, 12, 16 )0 a co-authorship network extracted from the High Energy Physics Theory section of 
arXiv. org: if ui and Uj have co-authored papers, we have edges in both directions. The raw data of Epinions 
and Flixster contain 76K users, 509K edges and 1M users, 28M edges, respectively. We use the METIS graph 
partition softwar^Jto extract a subgraph for both networks, to ensure that MC simulations can finish in a 
reasonable amount of time. 



5 http : //www2 . cs . sf u . ca/~sja25/personal/datasets/ Ratings timestamped 



b http : //research. microsoft . com/en-us/people/weic/projects . aspx 

7 http: / /glaros. dtc.umn.edu/gkhome/views/metis 
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Figure 3: Distribution of influence weights in Flixster 



Influence Weights. We use two methods, Weighted Distribution (WD) and Trivalency (TV), to assign in- 
fluence weights to edges. For WD, Wij = Aij/Nj, where Aij is the number of actions Ui and Uj both perform, 
and Nj is a normalization factor, i.e., the number of actions performed by Uj : to ensure Yl Ui e'N in (u-) Wi J — 1- 
In Flixster, Aij is the number of movies Uj rated after Ui\ in NetHEPT, Aij is the number o( papers Ui 
and Uj co-authored; in Epinions, since no action data is available, we use Wij = l/d zn (uj) as an approx- 
imation. For TV, Wij is selected uniformly at random from {0.001,0.01,0.1}, and is normalized to ensure 
^2u i eN m (u j ) Wi J — Fig- [£] illustrates the distribution of weights for Flixster; it shows that influence is 
higher in WD graphs than m TV graphs. 

Valuation Distributions. As mentioned in Sec. [I] valuations are difficult to obtain directly from users, 
and we have to estimate the distribution using historical sales data. In an Epinions . com review, a user 
provides an integer rating from 1 to 5, and may optionally report the price she paid in US dollars (see, 
e.g., http://tinyurl.com/773to53). If a review contains both price and rating, we can combine them to 
approximately estimate the valuation of that user, as in such systems, ratings are seen as people's utility for 
a good, and utility is the difference of valuation and price [21] . 

We observed that most products have only a limited number (< 100) of reviews, and thus a single product 
may not provide enough samples. To circumvent this difficulty, we acquired all reviews for the popular Canon 
EOS 300D, 350D, and 400D cameras. Given that these cameras followed a sequential release within a short 
time span (three years), we treated them as having similar monetary values to consumers. After removing 
reviews without prices reported, we are left with 276 samples. Next, we transform prices and ratings to 
obtain estimated valuations as follows: 



valuation = price * (1 + rating / 5). 



We then normalize the results into [0, 1] and fit the data to a normal distribution A/"(/i, a 2 ) with \i = 0.53 
and a = 0.14 estimated by max imum likelihood estimation (MLE). Fig. |4(a) plots the histogram of the 
normalized valuations; Fig. 4(b) presents the CDFs of our empirical data and A/"(0.53, 0.14 2 ). To test the 
goodness of fit, we compute the Kolmogorov-Smirnov (K-S) statistic [To] of the two distributions, which is 
defined as the maximum difference between the two CDFs; in our case, the K-S statistic is 0.1064. As can 
be seen from Fig. 4(b)[ A/"(0.53, 0.14 2 ) is indeed a good fit for the estimated valuations of the three Canon 
EOS cameras on Epinions . com. 

Since there are no price data to be collected in Flixster and NetHEPT, we use A/"(0.53, 0.14 2 ) in the 
simulations for all datasets. In addition, for completeness, we also test with the uniform distribution over 
[0, 1], i.e., W(0, 1), as it is commonly assumed in the literature [3}|21|- 
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Figure 4: Statistics of Valuations (Epinions . com) 



5.2 Experimental Results 

We compare PAGE, All-OMP, and FFS in terms of the expected profit achieved, price assignments, and 
running time. Although all algorithms employ U-G reedy which does not terminate until the marginal profit 
starts decreasing, for uniformity, we report simulation results up to 100 seeds. 
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Figure 5: Expected profit achieved (Y-axis) on Epinions graphs w.r.t. |S| (X-axis). (N)/(U) denotes nor- 
mal/uniform distribution. 
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Figure 6: Expected profit achieved (Y-axis) on Flixster graphs w.r.t. I^l (X-axis). (N)/(U) denotes nor- 
mal/uniform distribution. 



Expected Profit Achieved. The quality of outputs (seed sets and price vectors) of All-OMP, FFS, and 
PAGE for general ProMax are evaluated based on the expected profit achieved. Fig.[5j[6j and [7] illustrate the 
results on Epinions, Flixster, and NetHEPT, respectively. On each network, both valuation distributions are 
tested in four settings: WD weights with c a =0.1 and 0.001; TV weights with c a = 0.1 and 0.001. As prices 
and valuations are in [0,1], we use 0.1 to simulate high acquisition costs and 0.001 for low costs. Except 
for NETHEPT-TV with c a = 0.1 (Fig. [7^) and 0.001 (Fig. [fji), FFS is better than All-OMP; this indicates 
that only in NetHEPT-TV, influence is low enough so that giving free samples blindly to all seeds do impair 
profits. 

In all test cases, PAGE performed consistently better than FFS and All-OMP. The margin between PAGE 
and FFS is higher in TV graphs (by, e.g., 15% on Epinions-TV with A/"(0.53, 0.14 2 ), c a = 0.1) than that 
in WD graphs (by, e.g., 2.1% on Epinions- WD with A/"(0.53, 0.14 2 ), c a = 0.1), as higher influence in WD 
graphs can potentially bring more compensations for profit loss in seeds for FFS. Also, the expected profit 
of all algorithms under A/"(0.53, 0.14 2 ) is higher than that under ZY(0, 1), since adoption probabilities under 
A/"(0.53,0.14 2 ) are higher. 

Price Assignments. For A/"(0.53, 0.14 2 ) and U(0, 1), the OMP is 0.41 and 0.5, respectively. Fig. [^demon- 
strates the prices offered to each seed by All-OMP, FFS, and PAGE on Epinions-TV with A/"(0.53, 0.14 2 ^ 
All-OMP and FFS assigns 0.41 and for all seeds, respectively, For PAGE, as the seed set grows, price tends 
to increase, reflecting the intuition that discount is proportional to the influence (profit potential) of seeds, 
as they are added in a greedy fashion and those added later have diminishing profit potential. 

Running Time. Tables [2] and [3] present the running time of all algorithms on the three networks with 
WD weights and TV weights, respectively] As adoption probabilities under A/"(0.53, 0.14 2 ) are higher, all 

8 Similar results can be seen in other cases, which we omit here. 
9 The results for c a = 0.001 are similar, which are omitted here. 
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Figure 7: Expected profit achieved (Y-axis) on NetHEPT graphs w.r.t. |5| (X-axis). (N)/(U) denotes nor- 
mal/uniform distribution. 
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Figure 8: Price assigned to seeds (Y-axis) w.r.t. |5| (X-axis) on Epinions-TV with A/"(0.53, 0.14 2 ). 



algorithms ran longer with the normal distribution on all graphs. Similarly, as influence in WD graphs are 
higher, the running time on them is longer than that on TV graphs. 

All-OMP and FFS have roughly the same running time. More interestingly, PAGE is faster than both 
baselines in all cases. The observation is that in each round of U-G reedy, PAGE maximizes the marginal 
profit for each candidate seed in the priority queue maintained by CELF. Thus, heuristically, the lazy- 
forward procedure in CELF (see 18 ) has a better chance to return the best candidate seed sooner for 
PAGE. All-OMP and FFS also benefit from CELF, but since the marginal profits of candidate seeds are often 
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Algorithm 


Epinions-WD 


Flixster-WD 


NetHEPT-WD 


M 


U 


M 


U 


AT 


U 


All-OMP 


6.7 
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Table 3: Running time in hours (TV weights, c a = 0.1) 



suboptimal, elements in the CELF queue tend to be clustered, and thus the lazy-forward is not as effective. 
Besides, for PAGE under A/"(0.53, 0.14 2 ), the gold en section search usually converges in less than 40 iterations 
with stopping criteria 10 -8 (defined in Sec. 4.2); thus the extra overhead it brings is negligible compared to 
MC simulations. 

To conclude, our empirical results on three real-world datasets with two different valuation distributions 
demonstrate that the PAGE algorithm consistently outperforms baselines All-OMP and FFS in both expected 
profit achieved and running time. It is also the most robust (against various inputs) among all algorithms. 



6 Conclusions and Discussions 

In this work, we extend the classical LT model by incorporating prices and valuations to capture monetary 
aspects in product adoption, which we distinguish from social influence. We study the profit maximization 
(ProMax) problem under our proposed LT-V model, and prove NP-hardness and submodularity results. 
We propose the PAGE algorithm which dynamically determines the prices for nodes based on their profit 
potential. Our experimental results show that PAGE outperforms the baselines in all aspects evaluated. 

For future work, first, the added ingredients for LT-V can be used to extend models like IC [16] and 
LT-C 2 . Second, the current algorithms cannot scale to larger graphs due to expensive MC simulations. To 
achieve scalability, we can replace the MC simulations with fast heuristics for the LT model, e.g., LDAG 6 
and SimPath [12]. 

Another extension is to consider users' spontaneous interests in product adoption, and incorporate it into 
the LT-V model for profit maximization. Due to personal demand, a user may have spontaneous interests 
in a certain product even when no neighbor in the network has adopted. To model this, each node U{ is 
associated with a "network- less" probability Si |7 . An inactive node becomes influenced when the sum of 
Si and the total influence from its adopting neighbors are at least 6^. A marketing company can thus wait 
for spontaneous adopters to emerge first and propagate their adoption (for t time steps, where i is the 
diameter of G), and then deploy a viral marketing campaign to maximize the expected profit. Our analysis 
and solution framework (Sees. |3j [4| can be naturally applied to this setting. 

In addition, it is interesting to look into more sophisticated methodologies to acquire knowledge on user 
valuations, e.g., by leveraging users full previous transaction history, as well as look at real datasets besides 
Epinions . com. 
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